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WEIGHTED EMPIRICAL LIKELIHOOD IN SOME TWO-SAMPLE 
SEMIPARAMETRIC MODELS WITH VARIOUS TYPES 
OF CENSORED DATA 

By Jian-Jian Ren 1 

University of Central Florida 

In this article, the weighted empirical likelihood is applied to a 
general setting of two-sample semiparametric models, which includes 
biased sampling models and case-control logistic regression models as 
special cases. For various types of censored data, such as right cen- 
sored data, doubly censored data, interval censored data and partly 
interval-censored data, the weighted empirical likelihood-based semi- 
parametric maximum likelihood estimator (8„,F„) for the underlying 
parameter 9o and distribution Fo is derived, and the strong consis- 
tency of (8 n ,F n ) and the asymptotic normality of n are established. 
Under biased sampling models, the weighted empirical log-likelihood 
ratio is shown to have an asymptotic scaled chi-squared distribution 
for censored data aforementioned. For right censored data, doubly 
censored data and partly interval-censored data, it is shown that 
y/n(F n — Fo) weakly converges to a centered Gaussian process, which 
leads to a consistent goodness-of-fit test for the case-control logistic 
regression models. 

1. Introduction. Consider the following two-sample semiparametric model: 
Xi, . . . , X no is a random sample with density fo(x), 

(1.1) 

Y\, . . . , Y ni is a random sample with density go(x) = tp(x; Oo)fo(x), 

where the two samples are independent, and <p(x; 9q) is a known function 
with i£K and a unique unknown parameter 9q £ R 9 , while fo and go are 
the density functions of unknown nonnegative distribution functions (d.f.) 
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Fq and Go, respectively. This model (1.1) includes biased sampling models 
(Vardi [32]) and case-control logistic regression models (Prentice and Pyke 
[22]) as special cases, for which there has not been any published work 
dealing with censored data. In this article, we study model (1.1) when at 
least one of the two samples is not completely observable due to censoring. 
As follows, we use random sample X\, . . . ,X no to illustrate the censoring 
models under consideration here, while Examples 1 and 2 discuss biased 
sampling models and case-control logistic regression models, respectively. 



(1.2) Vi 



Right censored sample. The observed data are Oi = (Vi,Si), 1 < i < uq, 
with 

Xi, ifXj<Cj, 5i = l, 
Ci, if Xi>Ci, Si = 0, 

where Cj is the right censoring variable and is independent of X$. This type 
of censoring has been extensively studied in the literature in the past few 
decades. 

Doubly censored sample. The observed data are Oi = (Vi, Si), 1 < i < no, 
with 

( Xi, if Di<Xi<Ci, Si = l, 

(1.3) Vi = { d, if Xi > d, Si = 2, 

[Di, if Xi<Di, 5i = 3, 

where C, and Di are right and left censoring variables, respectively, and 
they are independent of X{ with P{Di < Cj} = 1. This type of censoring has 
been considered by Turnbull [31], Chang and Yang [4], Gu and Zhang [11] 
and Mykland and Ren [17], among others. One recent example of doubly 
censored data was encountered in a study of primary breast cancer (Ren 
and Peer [28]). 

Interval censored sample. 

Case 1. The observed data are Oi = (Ci,Si), 1 < i< no, with 

(1.4) 5i = I{X l <C i }. 

Case 2. The observed data are Oi = (Ci,Di,5i), 1 < i< no, with 

r i, if Di<Xi< c u 

(1.5) Si = I 2, if Xi > d, 

{ 3, if Xi < Di, 

where C, and Di are independent of Xi and satisfy P{Di < Cj} = 1 for Case 
2. These two types of interval censoring were considered by Groeneboom 
and Wellner [10], among others. In practice, interval censored Case 2 data 
were encountered in AIDS research (Kim, De Gruttola and Lagakos [16]; see 
discussion in Ren [26]). 
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Partly interval- censored sample. 



"Case 1" partly interval-censored data. The observed data are 

n fi \ n _ ( Xi, if 1 <i < k , 

Ui {(CM, iffc + l<i<n , 

where Si = I{X{ < C{\ and Ci is independent of Xi. 



(1.7) Oi 



General partly interval-censored data. The observed data are 

Xi, if 1 < % < ko, 

(C,Si), if k + 1 < i < n , 

where for N potential examination times C\ < ■ ■ ■ < Cn, letting Co = 
and Cat+i = oo, we have C = (C±, . . . , Cn) and Si = (S^ , . . . , S^ N+1 ^) with 

5\ = 1, if Cj-i < Xi < Cj; 0, elsewhere. This means that for intervals 
(0, Ci], (Ci, C2], . . . , (Cat, 00), we know in which one of them Xi falls. These 
two types of partly interval-censoring were considered by Huang [12], among 
others. As pointed out by Huang [12], in practice the general partly interval- 
censored data were encountered in Framingham Heart Disease Study (Odell, 
Anderson and D'Agostino [18]), and in the study on incidence of proteinuria 
in insulin-dependent diabetic patients (Enevoldsen et al. [5]). 



Example 1 (Biased sampling model). In (1.1), let 

(1.8) <p(x;e ) = e w(x), 8 em, 

where w(x) is a weight function with positive value on the support of Fq, and 
9q = 1/wo is the weight parameter satisfying wo = w(x) dFo(x). Then, 
(1.1) is a two-sample biased sampling problem, for which the case with length- 
biased distribution Go, that is, w(x) = x in (1.8), was considered by Vardi 
[32], and the empirical log- likelihood ratio for the mean of -Fo was shown to 
have an asymptotic chi-squared distribution by Qin [23] . More general biased 
sampling models were considered by Vardi [33] , Gill, Vardi and Wellner [9] , 
who discussed various application examples, and showed that the maximum 
likelihood estimator for Fq is asymptotically Gaussian and efficient. For right 
censored samples in (1.1), Vardi [33] gave an estimator for Fo based on the 
EM algorithm, but the asymptotic properties of the estimator were not 
studied. Below, we discuss practical examples of biased sampling problem 
with censored data. 

In Patil and Rao [20], the biased sampling problem is discussed in the 
context of efficiency of early screening for disease. Using our notations in 
(1.1), if Fo is the d.f. of the duration of the preclinical state of certain 
chronic disease, then the first sample in (1.1) is taken from those whose 
clinical state is detected by the usual medical care. If at a certain point in 
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time some individuals in the preclinical state begin participating in an early 
detection program, then such a program identifies them by a length-biased 
sampling. In other words, the second sample in (1.1) is taken from those 
who participated in the early detection program, and Go is a length-biased 
distribution. However, in reality a usual screening program for "disease" is 
conducted by examining an individual periodically with a fixed length of 
time between two consecutive check-ups. The data encountered in such a 
screening program is typically a doubly censored sample (1.3); that is, the 
actually observed data for the second sample in (1.1) is doubly censored. 
In statistical literature, examples of doubly censored data encountered in 
screening programs have been given by Turnbull [31] and Ren and Gu [27], 
among others. 

Example 2 (Case-control logistic regression model). In (1.1), let 
ip(x;0 o ) = e ao+f3ox , 

(1.9) 

F (x) = P{T <x\Z = 0}, G (x) = P{T <x\Z = 1}; 

then under reparameterization by Qin and Zhang [24], model (1.1) is equiv- 
alent to the following case- control logistic regression model (Prentice and 
Pyke [22]): 

exp(a* + (3qx) 



(1.10) P{Z = l\T = x} 



1 + exp(a* + Pqx) ' 



where 9q = ( a 0) A)) £ IR 2 , Z is the binary response variable (with value 1 or 
to indicate presence or absence of a disease or occurrence of an event of 
interest), T is the covariate variable, and (a*,/?o) is the regression parameter 
satisfying «o = a* + ln[(l — n)/ir] for tt = P{Z = 1}. Qin and Zhang [24] 
established asymptotic normality of the semiparametric maximum likelihood 
estimators (SPMLE) for (6q,Fq) in (1.9) with two complete samples in (1.1), 
and provided a goodness-of-fit test for (1.10). Below, we discuss an example 
to illustrate the situation with censored covariate variable T. 

In the example of early detection of breast cancer considered by Ren and 
Gu [27], T is the age at which the tumor could be detected when screening 
mammogram is the only detection method, and based on series screening 
mammograms the observed data on T are doubly censored. This example 
is part of a study on the effectiveness of screening mammograms; see Ren 
and Peer [28] for precise description of left and right censored observations. 
Here, to study the effects of screening mammograms on survival, we con- 
sider those individuals who had breast cancer, and let Z = 1 represent death 
due to breast cancer within 5 years of diagnosis; Z = 0, otherwise. Then un- 
der (1.9), for those "dead" (i.e., Z = 1) the second sample in (1.1) is taken 
from the available screening mammogram records; thus the actually observed 
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data from Gq(x) = P{T < x\Z = 1} is a doubly censored sample. Similarly, 
for those "survived" (i.e., Z = 0) the first sample in (1.1) is also taken from 
screening mammogram records; thus also a doubly censored sample. Fitting 
the logistic regression model (1-10) with these two doubly censored case- 
control samples, we obtain P{Z = 1\T = xq}, which is the probability of 
"death" for an individual whose tumor was detected by screening mammo- 
gram at age xq. 

In this article, we apply weighted empirical likelihood (Ren [25]) to model 
(1.1) with the following two independent samples for n = no + n\: 

Oi , . . . , 0* is the observed sample for sample X\ , . . . , X no , 

(1.11) 

1 , . . . , O ni is the observed sample for sample Y\ , . . . , Y ni , 

where Of 's or Oj's is possibly one of those censored samples described 
above, and we denote F and G? as the nonparametric maximum likelihood 
estimators (NPMLE) for Fq and Go based on Of 's and Oj's, respectively. 
Section 2 provides a heuristic explanation of the concept of weighted em- 
pirical likelihood. For censored data (1.2)-(1.7) aforementioned, Section 3 
derives the weighted empirical likelihood-based SPMLE (9 n ,F n ) for (6q,Fq), 
and establishes the strong consistency of (6 n ,F n ) and the asymptotic nor- 
mality of 9 n , while Section 4 further discusses Example 1 on biased sampling 
models, and shows that the weighted empirical log- likelihood ratio has an 
asymptotic scaled chi-squared distribution. For right censored data, doubly 
censored data and partly interval-censored data, Section 3 also shows that 
y/n{F n — Fq) weakly converges to a centered Gaussian process, while Sec- 
tion 5 further discusses Example 2 on case-control logistic regression models, 
and provides a consistent goodness-of-fit test. 

We note that the weighted empirical likelihood approach used in this 
article can be adapted to deal with more general biased sampling models. 
Also note that based on Ren and Gu [27], our results here on the case-control 
logistic regression models can be extended to fc-dimensional {k > 1) covariate 
T, where T contains one component that is subject to right censoring or 
doubly censoring. 

For interval censored data (1.4)-(1.5), the weighted empirical likelihood 
approach enables us to obtain the strong consistency of the SPMLE (6 n , F n ), 
the asymptotic normality of 9 n , and the limiting distribution of the log- 
likelihood ratio via the asymptotic results on the NPMLE F or G for inter- 
val censored data by Groeneboom and Wellner [10] and Geskus and Groene- 
boom [6], among others. However, the techniques used in our proofs show 
that the weak convergence of F n for interval censored data relies on that of 
F or G for interval censored data, which is now unknown. 
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2. Weighted empirical likelihood. For random sample X\, . . . ,X no from 
d.f. Fq, the empirical likelihood function (Owen [19]) is given by L{F) = 
Y\!i=i[F(Xi) — F(Xi—)], where F is any d.f. The weighted empirical likeli- 
hood function in Ren [25] may be understood as follows. 

For each type of censored data aforementioned, the likelihood function 
has been given in literature, and the NPMLE F for Fq is the solution which 
maximizes the likelihood function. Moreover, it is shown that from observed 
censored data {Of ; 1 < i < no}, there exist uiq distinct points W x < W x < 
■ ■ ■ < W* along with p x > 0, 1 < j < mo, such that F can be expressed 

as F(x) = J2i=ift X I{W X < x} for above right censored data (Kaplan and 
Meier [15]), doubly censored data (Mykland and Ren [17]), interval censored 
data Case 1 and Case 2 (Groeneboom and Wellner [10]) and partly interval- 
censored data (Huang [12]). Since in all these cases F is shown to be a strong 
uniform consistent estimator for Fq under some suitable conditions, we may 
expect a random sample X%, . . . , X* Q taken from F to behave asymptotically 
the same as Xi, . . . ,X no . If F* Q denotes the empirical d.f. of Xf, . . . ,X* , 
then from F ~ F* we have 

' ''(} 

«o no mo 

n p{x, =x t }^n p{x* = x *} = n (p{xi = wfyfi 

i=l i=l j=l 



■■H(p{xt = w j x }) 

3=1 



X x ^n [F{WX)-F{Wf -)] 



mi) 



=H(p{xz=wf}) n ° p i , 

where kj = nolF^iW^) — F* Q (Wj — )]. Thus, the weighted empirical likeli- 
hood function (Ren [25]) 

mo 

(2.1) L(F) = J[[F(W t x ) - F(W? 

i=l 

may be viewed as the asymptotic version of the empirical likelihood function 
L{F) for censored data. When there is no censoring, L(F) coincides with 
L(F). 



3. SPMLE and asymptotic results. This section derives the semipara- 
metric maximum likelihood estimator for (9q,Fq) in (1.1) using censored 
data (1.11), and studies related asymptotic properties. 

As general notations throughout this paper, let F and G be the NPMLE 
for Fo and Go in (1.1) based on observed censored data O x ,...,O x (j and 
Oi , . . . ,O ni in (1.11), respectively. From Section 2, we know that there 
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exist distinct points < ■■■ < W* Q and Wj < ■■■ < with pf > 
and pj > such that F and G can be expressed as 

mo mi 

(3.1) F(x) = X>f W X < x} and G{x) = ]Tpf < x} 

i=l i=l 

respectively, for those censored data aforementioned. We also let 

(w 1 ,...,w m ) = (w?,...,w£ ,wY,...,wZ ll ), 

(3-2) (p 1 ,...,p m ) = (p?,...,p* ,p(,...,pl li ), 

(u>l,...,U m ) = (p Pi,...,poPm iPlPli---iPlPYii), 

where m = mo + mi, po = no/n and p\ = n\/n. 

To derive an estimator for (6q,Fq) using both samples in (1.11), we apply 
weighted empirical likelihood function (2.1) to model (1.1), and obtain 

(m \ /mi \ 

I[[W)-^(^-)] no " f J \J[[G(W?) ~G{Wj -)]«>** J 

/mo \ 

inopf 



nmwf)- w-)] r 



v-;=i 

/ mi 



X 



nM^^o)[^ y )-^ y -)ir 



Thus, from (3.2) the weighted empirical likelihood function for model (1.1) 
is given by 



m 



mF)= ibr n w^)rM 

\i=l / \j=m +l / 

(3.3) 

for Pi = F(Wi)-F(Wi-), 

and the SPMLE (9 n ,F n ) for (9o,Fo) is the solution that maximizes L(6,F). 
One may note that the use of weighted empirical likelihood function (2.1) 
here provides a simple and direct way to incorporate the model assumption 
of (1.1) in the derivation of likelihood function (3.3) for censored data. Also 
note that using the usual likelihood functions for specific types of censored 
data would result in a much more complicated likelihood function which is 
very difficult to handle. 

To find (9 n ,F n ), we need to solve the following optimization problem: 

(m \ / m \ 

n*r n [^m^) 
i=l / \j=m +l / 
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(3.4) 

m m 

subject to pi > O^Pi = l,^2pi(p(Wi;9) = 1, 

i=l i=l 

where the last constraint reflects the fact that <p(x; 9)[F(x) — F(x— )] is 
a distribution function. Note that the NPMLE for censored data (1.2)- 
(1.7) is not always a proper d.f. (Mykland and Ren [17]). But for the 
moment, we assume Y^i=iPf = Y^i=\Pi = 1 m (3-1), which will not be 
needed later on for our main results of the paper. To solve (3.4), we first 
maximize L(9,p) with respect to p = (p\, . . . ,p m ) for fixed 9, then maxi- 
mize 1(9) = lnL(9,p) = m&x p lnL(9,p) over 9 to find 9 n . Noting that for 
Ui{9) = ip(Wi;9), constraints in (3.4) imply TT=iPiPi(0) - 1] = 0, we know 
that 9 must satisfy 

(3.5) [U {1) (9)-l]<0<[U (m) (9)-l]. 

Using the Lagrange multiplier method, it can be shown that for any fixed 9 
satisfying (3.5), the convexity of lnL(#,p) ensures that L(9,p) is uniquely 
maximized by L(9,p) (see pages 90-91 and 164 of Bazaraa, Sherali and 
Shetty [1]), where 

(3 - 6) P ~ i= l + A(0)^(0)-1]' < = 1 '-' m ' 

with X(9) as the unique solution on interval (— \Ui m \(0) — 1] , — \U{\)(9) — 
l]" 1 ) for 

(3 - 7) 0= ^ fl )=E 1 + w _ ir 

Thus, we have 1(9) = n YT=i + n E™= mo +i ^ ln I e ) ■ 

For our examples, we have 9q £ R or 9q £M? in (1.1), and that for some 
functions h\(9) and h 2 (x), the following assumption holds for <p(x;9) with 
(9gR or 6»GR 2 : 

(ASO) V(p(x;9) =<p(x; 9)hi(9){l, h 2 (x)) T for V = {d/d6 u d/d9 2 ) T , 

where < h\(9) G R is twice differentiable for 9 G 0; < h 2 (x) G R is mono- 
tone for x > 0; in the case 9 G R, we have degenerating h 2 (x) = 0; in the 
case 9 G R 2 , we always have strictly monotone h 2 (x) on the support of Fq. 
Throughout this paper, our notations mean that for the case 9 G R, only the 
nondegenerating component in equations, vectors and matrices is meaning- 
ful. To minimize 1(9), from (3.2), (3.6)-(3.7), ip(\(9);9) =0 and constraints 
in (3.4), we obtain that under assumption (ASO): 

or rn m 

— = -nX(9)h 1 (9)Y / Pi^(W l ;9) + nh l (9) £ 

1 i=l j=m +l 
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(3.8) = nh 1 {9)[p 1 -\{9)], 

81 ( m m 

nh 1 (e)( P1 PMWj)-mY.p^ w ^ e ) h ^ w i. 



d8o 



j=m +l i=l 



where the use of VA(#) in deriving (3.8) can easily be justified by the the- 
orems on implicit functions in mathematical analysis. If 9 n is a solution of 
VZ(0) = 0, then 

m m 

(3.9) A(0 n )=pi and £ pM W j) ~ Y,WP(Wv, 9 n )h 2 (Wi) = 0. 

j=mo + l i=l 

In the Appendix, we show that 9 n is equivalently given by the solution of 
equation(s): 

.dF(x)-r v^^x), 



(3.10) 




o po + 6>) Jo po + p\ip(x;d) 



p + pi<p(x;9) Jo Po + pi<p(x;9) 



by which we always mean that 9 n G R is the solution of gi(9) = if ^2(2;) = 0. 
For our examples, the unique existence of solution 9 n for (3.10) is shown 
in Sections 4 and 5, respectively, and it can be shown that 9 n maximizes 
1(9) over those 9 satisfying (3.5) (the proofs are omitted). Thus, 9 n is the 
SPMLE for 9q in (1.1). Consequently, replacing 9 by 9 n in (3.6), we obtain 
the following SPMLE F n for F : 

(3.11) F n (t)=J2PiI{Wi<t}= ^^^ d[pQ F(x)+ Pl G(x)\. 

i=i J o Po + pi<~p(x;9 n ) 

Since the equations in (3.10) only depend on the NPMLE F and 67, 
thus for the rest of the paper, 9 n denotes the solution of (3.10) without 
assumption Yn^iPi = Y^iPj = 1 in (3.1), and is used to compute F n in 
(3.11). In the following theorems, some asymptotic results on (9 n ,F n ) are 
established under some of the assumptions listed below, while the proofs are 
deferred to the Appendix. 

(AS1) (a) (p(x; 9) is monotone in x for any fixed 9 G G, where = < 
9 1 <oo} if 0gK; @ = {(9 1 ,9 2 )\a i <9 i <oo,i = l,2} if 9gR 2 ; 

(b) ip(x; 9) is increasing in 9\ (and in 9 2 if 9 G M?) for any fixed x > 0; 

(c) for fixed x > (and fixed 9 2 if 9 G M 2 ), <p(x; 9) oo(0), as 9\ — > oo(ai); 

(d) for 9 = (9 1 ,9 2 ) G M? and fixed x > 0, when -9 x j9 2 -> 7 with < 
7 < 00: ip(x\9) —¥ 0(oo) if x < 7(2; > 7), as 9 2 — ► 00; (p(x;9) — ► 
0(oo) if x > 7(2; < 7), as 9 2 ^ a 2 \ 

(AS2) Po = tt an d pi = 7^ remain the same as n — > 00; 
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(ASS) ^o /o°° f £&ffiff d ™ ~ ° N ^ ' 4*)' as n ~* °°' V^i >< 

Jo°° polp^o) <*[<?(*) - G (x)] 5 N(0,< fc ), as n - oo, where fc = 
0,1, and [h 2 (x)]° = 1; 

(AS4) HF-Foll a ^0, HG-Goll ^'0, as rwoo; 

(AS5) /^[^(^^[F^-Fo^n-O, ^[/^(x^d^zJ-Go^)]"^, as 
n— > 00, with finite J^°[h2(x)] k cIFq(x) and /o°°[^2(x)] fc dGo(x), where 
k — 1 ; 2 . 3 , 

(AS6) \/n Q (F — Fq) =4> Gf, y / n 1 (G — Go) =£• Gg, as n — ► 00, where G_f and 
Gq are centered Gaussian processes. 

Theorem 1. Assume (AS0)-(AS5). Under model (1.1), we have: 

(i) 6 n ^ 6*0, as ?i — > 00; 

(ii) ^/n(#„ - 6> ) N(0, E ), as n -> 00; 

(iii) ||F n — Fo|| 0, as n — > 00. 

Theorem 2. Assume (AS0)-(AS6). Under model (1-1), we have that 
^/n(F n — Fq) weakly converges to a centered Gaussian process. 

Remark 1 (Assumptions of theorems). For our examples, (ASO)-(ASl) 
hold, which will be discussed in Sections 4 and 5, respectively. From Gill 
[7], Gu and Zhang [11], Huang [12], Huang and Wellner [13] and Geskus 
and Groeneboom [6], we know that under some suitable conditions, (AS3) 
holds for censored data (1.2)-(1.7) aforementioned. We also know that for 
these types of censored data, (AS4) holds under some suitable conditions; 
see Stute and Wang [30], Gu and Zhang [11], Huang [12] and Groeneboom 
and Wellner [10]. For right censored data, (AS5) holds under some regu- 
larity conditions (Stute and Wang [30]). For other types of censored data, 
(AS5) is implied by (AS4) if the support of Fq is finite. On the other hand, 
if weaker consistency result is desired in Theorem l(i), assumption (AS5) 
can be weakened. Moreover, from Gill [7], Gu and Zhang [11] and Huang 
[12], we know that (AS6) holds under some suitable conditions for right 
censored data, doubly censored data and partly interval-censored data. The 
techniques used in our proofs show that the weak convergence of F n for in- 
terval censored data relies on that of NPMLE F or G for interval censored 
data, which is now unknown. 

4. Biased sampling models. For the biased sampling problem in Exam- 
ple 1, this section discusses assumptions (ASO)-(ASl), shows the unique 
existence of SPMLE 9 n for 9q £ K in (1.8), and studies the weighted empir- 
ical log-likelihood ratio for wq. 
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Under (1.8), we have that in (ASO), hx{6) = 1/9 for 9 G 6 = {9\a x = < 
6 < oo} and h,2{x) = 0, and that (ASl)(a)-(c) obviously hold for any mono- 
tone weight function w(x), while (ASl)(d) does not apply. Since h,2(x) = 0, 
9 n G M is determined by the first equation of (3.10). Note that (ASl)(c) 
and the Dominated Convergence Theorem (DCT) imply: \\m.Q^,Qg\(9) = 
—G(oo)/po < and limg^^ g\{9) = F(po)/pi > 0. Thus, the solution 9 n of 
equation g\{9) = uniquely exists because g[(9) > for 9 > 0. 



Weighted empirical log-likelihood ratio. From (3.3) and (3.6), we know 
that under (1.8), the weighted empirical likelihood ratio is given by R(F) = 
L(9,F)/L(9 n ,F n ) = (9/9 n )^ YIZM I Vi) nu> \ where F(x) = YZiPiHWi < 
*}, = l/[YH=iPMWi)] and pi = uji/[p + Pl 9 n w(Wi)]. Then, set S = 
{J w(x) dF(x)\R(F) > c} may be used as confidence interval for wq, where 
< c < 1 is a constant. Let 



r(9 ) = sup (9 /9nT Pl flWA^lPi > 0, 
I i=i 

(4-1) 

m m -.1 

i=i i=i °° J 

It is easy to show that 5 is an interval expressed by S = [Xl,Xjj], and that 
Xl < wo < Xy if and only if r(9o) > c, where Xl = infj/Q 00 w(x) dF(x)\F G 
F] and Xjj = sup{J oo w(x)dF(x)\F G F} for T = {F\R(F) > c, Pi > 0, 
J2iLiPi = !}■ We call [X^, Xjj] the weighted empirical likelihood ratio con- 
fidence interval for w$ , and the limiting distribution of weighted empirical 
log-likelihood ratio for those censored data (1.2)-(1.7) is given in the follow- 
ing theorem with a proof sketched in the Appendix. 



Theorem 3. Assume (AS2)-(AS5) for model (1.8). Then, -21nr(0 o ) ^ 
CoXi; as n ~ * 00 ' where < Co < oo is a constant and xl has a chi-squared 
distribution. 



5. Case-control logistic regression models. For the case-control logistic 
regression model in Example 2, this section discusses assumptions (AS0)- 
(AS1), shows the unique existence of SPMLE 9 n for 9 G M 2 in (1.9), and 
provides a goodness-of-fit test for model (1.10). 

Under (1.9), we have that in (ASO)-(ASl), h x (9) = 1 for 9 G 9 with oi = 
ci2 = — oo and h,2(x) = x, and that (AS1) holds for ip(x;9) = exp(a + fix) 
with 9 = (a, j3) 6l 2 . In the Appendix, we show that the solution 9 n of 
(3.10) exists uniquely. 
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Goodness- of -fit test. To assess the validity of logistic regression model 
assumption (1.10) with censored data, note that there are two ways to es- 
timate d.f. Fq in (1.9) using censored data (1.11). One is the NPMLE F 
based on the first sample, and the other is the SPMLE F n based on both 
samples under model assumption (1.10), that is, (1.9). Based on Theorems 
1 and 2, we have the following corollary on the asymptotic properties of F 
and F n with proofs deferred to the Appendix. 

Corollary 1. Assume (AS2)-(AS5) for model (1.9). Then, asn^oo: 

(i) \\F n — F\\ —> under model (1.10); 

(ii) ||F n — — > when model (1.10) does not hold [i.e., go(x) a = tp(x;6o) x 
fo(x) does not hold], where F\ ^Fq; 

(iii) y/n{F n — F) weakly converges to a centered Gaussian process under 
model (1-10) and assumption (AS6). 

Thus, from Remark 1 we know that for right censored data, doubly cen- 
sored data and partly interval-censored data, we may use the following 
Kolmogorov-Smirnov-type statistic to measure the difference between F 
and F n , which gives a goodness-of-fit test statistic for case-control logistic 
regression model (1.10): 

(5.1) T n = ^i\\F n -F\\=V^ sup \F n (t) - F(t)\. 

0<t<oo 

Bootstrap method. To compute the p- value for test statistic T n in (5.1), 
we suggest the following n out of n bootstrap method. Since n = (a n ,(3 n ) 
is determined by (3.10), it is a functional of the NPMLE F and G, denoted 
as 9 n = 6(F,G); in turn, (3.11) implies that F n (t) — F(t) is a functional of 
F and G, denoted as F n — F = t(F, G). Note that under model (1.1), 9q is 
the unique solution of equation(s): 

= 301(0)=/°° p^-^r dFoix)- r — ^ dGo(x), 

Jo po + pi<p{x;0) Jo po + Pif{x;0) 

= 502(6')=/ — ——-dFo{x)- ■ ——dG {x), 

Jo po + Pi<p{x;6) Jo po + PW{x;6) 

by which we always mean that 0q S R is the solution of g$\ (0) = if hi (x) = 0. 
Thus, under (1.9) we have #0 = (ao,/3o) = 0(Fq,Gq); in turn, t(Fq,Go) = 0, 
which means T n = y / n||r(F, G) — t(Fq, Go)\\ under model (1.10). Hence, from 
the formulation given in Bickel and Ren [3], the distribution of T n under 
model (1.10) can be estimated by that of T* = ^/E\\t(F*,G*) - t(F,G)\\, 
where F* and G* are calculated based on the n out of n bootstrap samples, 



(5.2) 
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respectively. For instance, F* is calculated based on the bootstrap sample 
Of*, . . . , Of* taken with replacement from {Of, . . . , Of Q }. The p- value is 
estimated by the percentage of T*'s that are greater than test statistic T n . 
Note that the n out of n bootstrap consistency for - v /n (i ? — Fq) estimated by 
■ v /n (-F* — F) has been established for right censored data, doubly censored 
data and partly interval-censored data by Bickel and Ren [2] and Huang 
[12]. 

Remark 2. The proposed test (5.1) can be used for any type of censored 
data as long as (AS2)-(AS6) hold. When (AS6) does not hold, such as for 
interval censored data, Corollary 1 shows that we may graphically check the 
model fitting for (1.10) by comparing curves of F and F n . Note that when 
model (1.10) does not hold, statistic T* is still asymptotically a function of a 
centered Gaussian process, but T n ^4' oo based on Corollary 1(h). Thus, our 
proposed test is consistent. In terms of computing (a n ,(3 n ), it can be done 
using the Newton-Raphson method described on page 374 of Press et al. 
[21] to solve (3.10); a computation routine in FORTRAN is available from 
the author. Although not presented here, our extensive simulation studies 
on (a n ,(3 n ) and the comparison between the distributions of T n and T* give 
excellent results. 



APPENDIX 



Proof of "0 n is equivalently given by the solution of (3.10)." 



Under assumption Y^iPi 



Yn=tiPi = 1) t ne fi rs t equation of (3.9) is 



equivalent to vp(pi;8) = 0, which by (3.7) and (3.1)-(3.2), gives gi(0) = 
in (3.10). The proof follows from that (3.6) and X(0) = p\ imply that the 
second equation of (3.9) is = —pog2(0)- D 
Proof of "unique existence OF8 n in Example 2." Let 



Rn{0) 



(A.l) 



+ 



oo 1 

— ln[po 
Pi 



oo l 

— In 

Po 



- pi<p(x; 6)] dF(x 
Po + Pw{x;0) 



dG(x). 



Since F and G are step functions with finite jumps, we know that R n (0) is 
well defined on R 2 . From (A.l) and (3.10), we have VR n (9) = ^(6)^(6), 
g 2 (9)) T and 

/ d 2 R n d 2 R n \ 



dd 2 
d 2 R n 
V 80! 86 2 



89 2 80! 

8 2 R n 

80 2 



J 
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= (g 1 (0),g 2 (0)) T (Vh 1 (0)) T 



(A.2) 



] I \h 2 (x) h\{x) 
<p(x;0) 



■d[p F(x) + Pl G(x)}. 



[p +pi(p{x]6)} 2 

Thus, VR n (0) = is equivalent to (3.10) because hi(0) > by (ASO). For 
Example 2, we have h\{0) = 1 and h 2 (x) = x, which imply that ^R„fi is 
a positive-definite matrix. Hence, R n {0) is strictly convex. Moreover, note 
that under (1.9), we have in (A.l) R n {0) > (hi/9o)/pi + (hi/9i)/po for any 
= (a, (5) £K 2 , and that by a similar argument used in (6.5) of Ren and Gu 
[27], we can show: linr^oo inf R n {Xe\, \e2) = oo for any e 2 + e| = 1. Hence, 
R n {0) has a unique global minimum point which must be the solution of 
(3.10) (see pages 101-102 of Bazaraa, Sherali and Shetty [1]). □ 

Proof of Theorem l(i). Let p(x) = p F(x) + piG(x); then (3.10) 
gives 



(A.3) 



, f°° du(x) 1 
F(oo)= / PV - < — , 

g(oc)= rM)*w<i i 

p + pnp(x;0 n ) Pi 



where (AS4) implies -F(oo) 1, G(oo) — > 1, as n — > oo. As follows, we show 

#n = O(l) almost surely for case n = (0n\0~n^) G R 2 (the proof for case 
n £ R is similar) . 

Assume ffi > 0. If §n ^ — > oo, then from integration by parts, the bound- 
edness of the integrand function, (ASl)(b)-(c) and the DCT, we have that 
in (A.3): 

(A.4) 1 = lim r . < r lim = 0, 

™7o p o + pMx;0 n ) Jo n ~>°° Po + Pl <p(x;0 { n\o) 

a contradiction, where po{ x ) = PqFq(x) + p\Gq{x). Thus, 0n > implies 
0~n^ = 0(1) or 0~n^ — ► o,\ . Similarly, we know that < 0~n^ < M2 < 00 and 
0~n ^ -> ai imply 1 = lim/^ljOo+Pi^;^)] -1 d/Xo(») > /o°° ^[Po+Pi^; #n , 
-^2)] _1 dp,o(x) = 1/pO) a contradiction. Hence, if #1 2 ^ > 0, then #1 2 ^ = 0(1) 
implies ^ = 0(1). 

Assume 0$ — ► 00,-0^/0^ — ► 7 with < 7 < 00. Similarly as (A.4), 
(AS1) gives 

d/x (x) /x (t) 



(A.5) 1= / lim 



oc 
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where we must have < 7 < 00 to be inside the support of Fq; a contradiction 
otherwise. Also, if we let n — > 00 in the second equation of (3.10), then 
from (AS4)-(AS5), Holder's inequality, the DCT and an argument similar 
to above, we have 

1 roo in 
(A.6) — / h 2 {x)dF (x) = — h 2 (x)dG (x). 

Pi J "i Po Jo 

However, (A.5)-(A.6) contradict [G (j) J 7 °° h 2 (x) dF (x) - F (j) x 
J 7 h 2 (x)dG {x)) = S! x<1<y [h 2 (y) - h 2 (x)]dF (y) dG (x) / 0, which is im- 
plied by (ASO). Thus, if 9 { n ] > 0, we must have 9 { n ] = O(l); in turn, 9$ = 
0(1). Similarly, we can show 9 ( n ] = O(l) and 9 { n ] = 0(1) if 9 { n ] < 0. Hence, 
we have 9 n = 0(1) almost surely. 

Assume 9 n — > t]q, as n — > 00. Then, from (3.10) and an argument similar 
to that used in (A.6), we know that 770 is a solution of (5.2). Note that for 
nondegenerating h 2 (x), to obtain the second equation of (5.2) for 770 we use 
(AS5) and the proof of Lemma 3 of Gill [8], noticing that h 2 (x) is monotone 
and [po + pnp(x; ?7o)] -1 is bounded and continuous. Hence, the proof follows 
from the uniqueness of the solution for (5.2). □ 

PROOF of Theorem l(ii). Here, we only prove the case 9 n £ M 2 , be- 
cause the proof for case 9 n G M is similar. For R n (9) in (A.l), we have that 
under model (1.1): 

VR n (9 ) = M0 o )(bi(0o) " Soi(0o)], MOo) ~ <?o 2 (#o)]) T , 

(A.7) 

VR n (9 n ) = VR n (9 ) + X RnA (9 n - 9 ) T + \{n(9 n ), r 2 (9 n )) T , 

where gi,g 2 and 5oi)9o2 are given in (3.10) and (5.2), respectively; ^R n ,e 
is given in (A. 2); and from (AS5), Theorem l(i) and straightforward cal- 
culation based on (A. 2), we have ri(9 n ) = o p (9 n — 9q). From (A.7), (AS3), 
the independence between F and G, and page 4 of Serfling [29], we know 
that \/n\J R n (9o) converges in distribution to a normal random vector, while 
(A. 2), (5.2) and a similar argument in (A.6) imply 



S -R„,0o ^ s i - ^1(^0) / 

(A.8) 



1 h 2 (x)\ tp(x;9 )dp,o(x) 



o \h 2 {x) h\{x)) [p + pnp(x]9 )} 2 

as n — > oo, 



where £1 is positive-definite. Hence, VR n (9 n ) = 0, (A.7)-(A.8) and Theo- 
rem l(i) give 

(A.9) ^{9 n -9 Q ) = -^ 1 ^VR n {9 Q ) + o p {l). □ 
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Proof of Theorem l(iii). Here, we only prove the case 9 n SIR 2 , be- 
cause the proof for case 6 n € M is similar. For any t > 0, we let F n (t) = g 3 (0 n ) 
in (3.11); then 



(A.10) 



F n (t)=g 3 ( 



g 3 (0 ) + (0 n - 9o)Vg 3 (0 o ) + \(9 n - 6o)-Z g3t t n (e n - 6 



where £ n is between 9 n and 6$, and 

rt 



J 93,e 



-Pihi(6) 

( ^93 

del 

d 2 g 3 



o 



802 80! 

d 2 g 3 



<p{x;6) 



[p Q + pnp{x]9)] 



■dp{x), 



(A.ll) 



V ae 1 8e 2 del / 
[h 1 (e)]- l Vg 3 (e)[vh 1 {e)} T 

- Pl h\{e) l*( 1 W 



o \h 2 (x) h\[x) 

tp(x;0)\po - Pl ip(x;6)] 



dp(x). 



[po +pnp(x;9)] 3 
From (AS5) and Theorem l(i), we have that uniformly in t, 

d 2 g 3 {in) ^ r^^ixW^i^/de^ + hli^ix)} 



del 



<pi 



[P0 +PW{x^n) 



dfi(x) 



Oa.s.(l), 



which also holds for other partial derivatives in (A.ll). Thus, Theorem 1(h) 
implies that with (# n — 6*0)^53(^0) = c>a.s.(l)> (A. 10) can be written as 

(A.12) F n (t) = g 3 (e ) + (e n - e )Vg 3 (e ) + o a . s .(|4 - e \ 2 ). 

From (AS4) and integration by parts, we have |<?3(#o) ~~ -^b(*)l ~* f° r anv 
fixed t > 0; in turn, the proof follows from (A.12) and Polya's Theorem. □ 

Proof of Theorem 2. Here, we only prove the case 0~ n G R 2 , because 
the proof for case e n € R is similar. Let (vi,V2j = ^93{@o) as in (A.ll), 
and let (vi,v 2 ) T = -pi/ii^o) / *(1, h 2 (x)) T tp(x; o )[po + pi(f(x; 6> )]~ 2 dp Q (x). 
From (AS4) and integration by parts, we have \vk{t) — Vk{t)\ for any 
fixed t > 0, where k = 1,2. Since %(t) and Vk{t) are continuous and mono- 
tone in t, then from (AS5) and a similar argument used in the proof of 
Theorem l(i) for showing 770 as the solution of (5.2), we have \\ik — v^W — ► 0, 
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as n — > oo. Thus, if we let uq[x) = l/[po + Pw{x\ &o)], ui(x) = uo(x)(p(x; 9o), 
and Xij the elements of E^, then (A.7), (A.9), (A.12), (1.1) and Theo- 
rem 1(h) imply 

MFn(t) - F (t)} 

(A.13) 

+ / u (x) djl(x) - F (t) - (v 1 (t),v 2 (t))-E^ 1 VR n (e ) 



= op(l) + V^(£>F - IZp) + V^(t>G - U G ), 
where for si(i) = h\(6 )[XiiVi(t) + A 2 i?; 2 (t)] and s 2 (t) = hi(9 )[Xi 2 vi(t) + 

A 22 V 2 (t)], 

Mu F (t) - u F (t)] 

(A.14) = y ^ i { p0 J u (x)d[F(x)-F (x)] 



Sl (t) Ul (x)d[F(x)-F (x)} 
Jo 

poo 

s 2 {t) «i(x)/» 2 (x)d[F(a;)-Fo(x)] 



V£p7 G (t) - U G {t)\ 

= r 2 (^i(G-G )) 

(A.15) = yfc(o 1 j\(x)d[G{x)-G Q {x)\ 



+ ai(t) / u (x)d[G(x)-G (x)} 
Jo 

roc 

+ s 2 (t) u (x)h 2 {x)d[G(x)-G (x)} 



As (A.14) is a linear functional of y / n (i ? — i*b), (AS6) implies ^yn(Up — 
Up) =£■ Ti(Gir), as n — > oo, where from pages 154-157 of Iranpour and Cha- 
con [14], we know that t\(&f) is a centered Gaussian process. Similarly, 
^/ti\Ug — Uq] in (A.15) weakly converges to a centered Gaussian process 
t 2 (&g)- The proof follows from (A.13)-(A.15), and that t\(Gf) and t 2 (Gg) 
are two independent centered Gaussian processes. □ 

Proof of Corollary 1. Note that part (i) follows directly from The- 
orem l(ih) and (AS4), while part (hi) follows from some minor adjustments 
in the proof of Theorem 2. Thus, we only give the proof of part (ii) as follows. 
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Here, we have h\{9) = 1 and h>2(x) = x; thus in (A. 2) we have Vh\{9) 
0. Prom the proofs of the unique existence of 9 n and Theorem l(i), we 
know that when model (1.10) does not hold, 9 n is still well defined, and 
satisfies \9 n — 9\\ ^5-' 0, as n — > oo, where 6\ = (ai,/?i) is the unique solution 
of (5.2) for (p(x;9) = exp(a + /?x). Applying this, (AS4) and integration by 
parts to (3.11), we have \\F n — Fi\\ -4' 0, where Fi(t) = J [po + Pi exp(ai + 
(3ix)]~ l dpo(x). It is easy to verify that F\ ^ Fq when (1.10) does not hold 
[otherwise, we have go(x) a = <p(x;9i)fo(x) with 9\ = 9q], and that the first 
equation of (5.2) implies that F\ is a distribution function. □ 

Proof of Theorem 3. For a simpler argument, we assume Y^iPf = 
J2i^\PY = 1 in (3.1), which can be removed with some additional work in 
our proof here. To get an expression of r(9o), it can be shown by using the 
Lagrange multiplier method that the solution of the maximization problem 
in (4.1) is pi = cjj/(l + XqUi), 1 < i < m, where U{ = [9ow(Wi) — 1], u>i is 
given in (3.2), and Ao is the unique solution of equation 0(A) = on interval 
(-U£ y -U£) for 0(A) = TT=iPiUi = T£MUi)/(l + \Ui). Thus, we have 

(A.16) lnr(# ) = -nf>ln( 1 + W ) - n Pl ln(fl w /fl )- 



i=\ 



Po + pi9 n w(Wi 



Using Taylor's expansion on 0(A), we have that from t/j(pi;6 n ) = in 
(3.7), 

0'(£)(pi-A o ) = 0(pi)-0(A o ) 

= Y i {u) i U i )/{l + p 1 U i ) 

i=l 

(A.17) 

= ™ uj^owjW,)-!] _ ™ Ui[9 n w(Wi) - 1] 

2-S n „ -I- O^nfWA 2s 



I 

III 

E 



! Po + pi9 w{Wi) f^p + pi9 n w(Wi) 

LuMWiWo - 9 n ) 



where £ is between p\ and Ao, and & is between 9q and 9 n . From (AS4), 
integration by parts and Theorem l(i), we know that (A.17) implies 

(A.18) ( Pl - A ) = (6» - 9 n ) f -L, £ ^f%^r 2 + Op(l) 

\<F\Pi) ~{ [Po + pi9 w{Wi)Y 

Also using Taylor's expansion, we have 

ln[p + Pl 9ow(Wi)] 
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(A.19) = ln(l + Pl Ui) = ln(l + A ^) + - ■ ^ - (pi - A ) 

TT2 tt3 

(/ 3 i- a o) 2 + 77 --^ T7 ^(pi-Ao) 3 , 



(A.20) 



2(l + A C/i) 2Vri uy 6(1 + m Ui) 3 
ln[p + Pl e w(Wi)] 

po + pi9 n w(Wi) 

[pMWi)f 



2[p G + Pl e n w{w i )f 



+ 



(A.21) 



6[p + Pl QwWW y " u ^' 
ln(0 n /0 o ) 

_ #0 — #n _ (9p — 9 n ) 2 (6*0 — 9 n ) 3 

In 261 6C : 



where r\i is between p\ and Ao, while C; and ( are between 0q and 
Since (A. 18) and Theorem l(ii) imply (pi — Ao) = O p (n _1//2 ), then from 

ET=MUi)/(\ + AoC^.) = and 9nYT=l["MWi)}/[p + piflU(Wi)] = 
TdLiPMWfJn) = 1, and by applying (A.19)-(A.21) to (A.16), we obtain 

lnr(9 ) = O p (n I) E (1 + W 

(A. 22) * 

n(9 n -6 ) 2 fpi ^ UilpxwiWi)] 2 \ 



M ^i[pO + Pl0nW(Wi)] 2 J 

Hence, the proof follows from Theorem l(ii) and applying (A. 18) to (A. 22), 
where the limits of the coefficients of n(9 n — 9q) 2 are handled similarly to 
(A.8). □ 
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